Back

Systematic Entomology

Wiley

Preprints posted in the last 30 days, ranked by how well they match Systematic Entomology's content profile, based on 11 papers previously published here. The average preprint has a 0.00% match score for this journal, so anything above that is already an above-average fit.

1
TaxonMatch: taxonomic integration and tree construction from heterogeneous biological databases

Leone, M.; Rech De Laval, V.; Drage, H. B.; Waterhouse, R. M.; Robinson-Rechavi, M.

2026-03-20 evolutionary biology 10.64898/2026.03.18.712418 medRxiv
Top 0.1%
1.2%
Show abstract

Integrating taxonomic data from various sources presents a significant challenge in the study of biodiversity research, due to non-standardized nomenclature and evolving species classifications. Discrepancies between major repositories like the Global Biodiversity Information Facility (GBIF) and the National Center for Biotechnology Information (NCBI), as well as citizen science platforms such as iNaturalist, lead to fragmented and sometimes inaccurate biological data. We present TaxonMatch, a tool designed to address these challenges. TaxonMatch aligns taxonomic names, resolves synonymy, and corrects typographical and structural inconsistencies across databases. We show how it can be used to build a common backbone arthropod taxonomy over NCBI, GBIF and iNaturalist, to find the closest molecular data to a given fossil, and to identify IUCN endangered species with molecular data. TaxonMatch provides a cohesive taxonomic framework and a consistent taxonomic backbone, and can be applied to any taxonomic source. The tool is available at https://github.com/MoultDB/TaxonMatch.

2
Phylogenomics of the mega genus Bulbophyllum (Orchidaceae) and implications for its infrageneric classification

Nanjala, C.; Simpson, L.; Hu, A.-Q.; Patel, V.; Nicholls, J. A.; Bent, S. J.; Gale, S. W.; Fischer, G. A.; Goedderz, S.; Schuiteman, A.; Crayn, D.; Clements, M. A.; Nargar, K.

2026-04-01 evolutionary biology 10.64898/2026.03.30.715161 medRxiv
Top 0.1%
1.0%
Show abstract

Understanding evolutionary relationships in hyperdiverse plant groups remains a major challenge in systematics. The orchid genus Bulbophyllum, the second largest genus of flowering plants, represents an exceptional example of phylogenetic and morphological complexity. Relationships, particularly within the species-rich Asian clade, have remained poorly resolved due to extensive morphological variation and limited resolution in previous phylogenetic studies. Here, we reconstructed phylogenetic relationships using 63 plastid genes from 355 specimens representing 322 species and 65 of the 97 recognised sections of Bulbophyllum. Our analyses confirmed that the genus comprises five major evolutionary lineages comprised of species predominantly from Australasia, Madagascar, Continental Africa, Neotropics, and Asia. We provide the first robust phylogenetic evidence for a dichotomous split within the Asian clade into two well-supported lineages: the Asian-Malesian clade and the Malesian-Papuasian clade, with the latter containing a strongly supported Papuasian subclade. Additionally, this study supports the monophyly of several currently recognised sections while clarifying relationships in previously problematic groups. This study provides the most comprehensive plastid-based phylogenomic framework for Bulbophyllum to date and establishes a foundation for future taxonomic revision and integrative analyses of diversification and trait evolution within this hyperdiverse genus.

3
Reassessing display behavior from Bels et al. (2025) given the complexity of anthropogenic hybridization and intraspecific diversity in Iguana iguana

van den Burg, M. P.; Thibaudier, J.

2026-03-23 zoology 10.64898/2026.03.19.713079 medRxiv
Top 0.1%
0.8%
Show abstract

Understanding behavioral differences between non-native and closely related endangered species could be important to aid conservation management. In volume 169 of Zoology, Bels et al. (2025) reported on their comparison of display-action-patterns (DAP) between native Iguana delicatissima and non-native iguanas present on islands of the Guadeloupe Archipelago in the Caribbean Lesser Antilles. Here, we address conceptual and methodological concerns about their work and reanalyze their data given our proposed corrections, primarily a literature-informed adjustment of their "species" category. We additionally utilize online videos from South American mainland I. iguana populations, from where the non-native iguanas in the Guadeloupe Archipelago originate, to better understand the different DAPs between native and non-native iguanas in the Guadeloupe Archipelago. Significant differences in DAP characteristics among "species" categories (native I. delicatissima, non-native iguanas, and hybrids) show that Bels et al. (2025) oversimplified their data analyses by merging all non-native populations into one group. This result indicates the presence of behavioral variation among subpopulations within widely hybridizing iguanid populations, which has been poorly studied. Additionally, videos from mainland populations across two major mitochondrial clades of Iguana iguana show that non-native iguanas on Guadeloupe retained DAP characteristics of those populations from which they originate. We discuss these findings in light of the proposed hypotheses put forward by Bels et al. (2025), of which two can be excluded. Overall, our reanalysis shows that studies focusing on characteristics within settings of complex hybridization in diverse species should acknowledge this complexity.

4
Ancestral state reconstruction with discrete characters using deep learning

Nagel, A. A.; Landis, M. J.

2026-03-21 evolutionary biology 10.64898/2026.03.19.712918 medRxiv
Top 0.1%
0.7%
Show abstract

Ancestral state reconstruction is a classical problem of broad relevance in phylogenetics. Likelihood-based methods for reconstructing ancestral states under discrete character models, such as Markov models, have proven extremely useful, but only work so long as the assumed model yields a tractable likelihood function. Unfortunately, extending a simple but tractable phylogenetic model to possess new, but biologically realistic, properties often results in an intractable likelihood, preventing its use in standard modeling tasks, including ancestral state reconstruction. The rapid advancement of deep learning offers a potential alternative to likelihood-based inference of ancestral states, particularly for models with intractable likelihoods. In this study, we modify the phylogenetic deep learning software O_SCPLOWPHYDDLEC_SCPLOW to conduct ancestral state reconstruction. We evaluate O_SCPLOWPHYDDLEC_SCPLOWs performance under various methodological and modeling conditions, while comparing to Bayesian inference when possible. For simple models and small trees, its performance resembles the performance of Bayesian inference, but worsens as tree size increases. While O_SCPLOWPHYDDLEC_SCPLOW still performs adequately for more complex models, such as speciation and extinction models, the estimates differ more from Bayesian inference in comparison with simpler models. Lastly, we use O_SCPLOWPHYDDLEC_SCPLOW to infer ancestral states for two empirical datasets, one of the ancestral ranges of a subclade of the genus Liolaemus and ancestral locations for sequences from the 2014 Sierra Leone Ebola virus disease outbreak.

5
Assessing the potential of bee-collected pollen sequence data to train machine learning models for geolocation of sample origin

Hayes, R. A.; Kern, A. D.; Ponisio, L. C.

2026-04-01 bioinformatics 10.64898/2026.03.29.715128 medRxiv
Top 0.1%
0.7%
Show abstract

Pollen is a robust and widespread substance that captures a historical snapshot of a specific time and place, and it can be used to track movements through space by examining the pollen deposited on various objects. Palynology, the study of pollen, is used across fields such as conservation, natural history, and forensics, where it is particularly useful for tracing the origin and movement of objects. However, pollen has remained underutilized due to the difficulty of distinguishing many pollen taxa beyond the family level and limited pollen reference material to support location predictions. With recent developments in pollen DNA metabarcoding these issues have been rectified, but much of the available pollen data are primarily from wind-pollinated species, which are widespread and less informative of specific sample locations. Bee-collected pollen presents an untapped resource in training predictive models to geolocate sample origin. Here we compiled bee-collected pollen DNA sequence relative abundance data from three projects in the western U.S. and assessed the accuracy of supervised machine learning models to predict the location of sample origin based solely on pollen assemblage, without the need of incorporating additional data. Random Forest and k-Nearest Neighbors models yielded high accuracy across all projects. We also found that models trained on taxonomically clustered pollen assigned sequence variants (ASVs) performed slightly better than those trained on raw sequence data, but the difference was minor, indicating that models trained on raw sequence data can reliably predict location and avoid the time-consuming taxonomic assignment process. Our results demonstrate the utility of repurposing bee-collected pollen for geolocation and provide a framework for employing supervised machine learning in future geolocation efforts. HighlightsO_LIBee-collected pollen metabarcoding data was used to accurately predict sample origin C_LIO_LIRandom Forest and k-Nearest Neighbors algorithms were most accurate with lowest error C_LIO_LITaxonomically-classified and raw DNA sequence data training sets performed comparably C_LI

6
k-Nearest Common Leaves algorithm for phylogenetic tree completion

Koshkarov, A.; Tahiri, N.

2026-04-04 evolutionary biology 10.64898/2026.04.02.716144 medRxiv
Top 0.1%
0.5%
Show abstract

Phylogenetic trees represent the evolutionary histories of taxa and support tasks such as clustering and Tree of Life reconstruction. Many established comparison methods, including the Robinson-Foulds (RF) distance, assume identical taxon sets. A methodological gap remains for trees with distinct but overlapping taxa. Existing approaches either prune non-common leaves, which can discard information, or complete both trees such that they share the same taxa. Completion is more comprehensive, but current methods typically ignore branch lengths, which are essential for identifying evolutionary patterns. This paper introduces k-Nearest Common Leaves (k-NCL), an algorithm for completing rooted phylogenetic trees defined on different but overlapping taxa. The method uses branch lengths and topological characteristics and does not rely on a specific distance measure. The k-NCL algorithm is designed to preserve evolutionary relationships in the trees under comparison. The running time is O(n2), where n is the size of the union of the two leaf sets. Additional properties include preservation of original distances and topology, symmetry, and uniqueness of the completion. Implemented in Python, k-NCL is evaluated on biological datasets of amphibians, birds, mammals, and sharks. Experimental results show that RF combined with k-NCL improves phylogenetic tree clustering performance compared to the RF(+) tree completion approach. Availability and implementationAn open-source implementation of k-NCL in Python and the datasets used in this study are available at https://github.com/tahiri-lab/KNCL.

7
Estimating Bayesian phylogenetic information content using geodesic distances

Milkey, A.; Lewis, P. O.

2026-04-01 evolutionary biology 10.64898/2026.03.31.715656 medRxiv
Top 0.2%
0.5%
Show abstract

AO_SCPLOWBSTRACTC_SCPLOWA new Bayesian measure of phylogenetic information content is introduced based on geodesic distances in treespace. The measure is based on the relative variance of phylogenetic trees sampled from the posterior distribution compared to the prior distribution. This ratio is expected to equal 1 if there is no information in the data about phylogeny and 0 if there is complete information. Trees can be scaled to have the same mean tree length to avoid dominance by edge length information and focus on topological information. The method scales well, requiring only that a valid sample can be obtained from both prior and posterior distributions. We show how dissonance (information conflict) among data sets can also be estimated. Both simulated and empirical examples are provided to illustrate that the new approach produces sensible and intuitive results.

8
Transcriptomes resolve phylogenetic relationships and reveal undescribed diversity in taildropper slugs (Genus Prophysaon)

Smith, M. L.; Moshier, S.; Shoobs, N. F.

2026-03-27 evolutionary biology 10.64898/2026.03.25.713997 medRxiv
Top 0.2%
0.4%
Show abstract

The temperate rainforests of the Pacific Northwest of North America harbor many endemic taxa whose evolutionary histories have been shaped by major climatic and geologic events. The enigmatic taildropper slugs (genus Prophysaon) are one example, notable for their ability to autonomize their tails to escape predators. Despite extensive work uncovering the evolutionary history of individual lineages, relationships among the nine recognized species of Prophysaon remain poorly understood due to insufficient molecular data. To address this, we collected transcriptomes for six of the nine currently accepted species of Prophysaon. Using these data, we were able to resolve species relationships, calling into question the existing subgeneric classification based on morphology. We also detected undescribed phenotypic diversity within the P. andersonii--P. foliolatum species complex, with molecular data supporting the distinctness of two phenotypically distinct populations from Washington. Finally, our transcriptomic data suggest a moderate role of introgression in shaping the evolutionary history of Prophysaon. Here, we synonymize the subgenus Mimetarion with nominotypical Prophysaon. Future work should further investigate whether the undescribed diversity detected here represents species level differentiation.

9
How much information is there for inferring species trees?

Milkey, A.; Chen, J.; Lewis, P. O.

2026-04-02 evolutionary biology 10.64898/2026.04.01.715836 medRxiv
Top 0.2%
0.3%
Show abstract

AO_SCPLOWBSTRACTC_SCPLOWAs modern phylogenomics datasets become increasingly large, it is useful to develop recommendations for how to subsample datasets for best species tree inference. Here we apply a new measure of phylogenetic information content that estimates the reduction in tree space occupied by a posterior sample of inferred trees relative to a prior sample in order to assess the effects of gene tree parameters on species tree estimation. We find that, consistent with earlier studies, when data are informative, more data result in better species tree inference. However, when data are uninformative, subsampling a dataset to include only the most informative loci may produce a better species tree sample. We perform analyses on a variety of simulated and empirical datasets.

10
Forensic Identification of Confiscated Helmeted Hornbill (Rhinoplax vigil) Casques and Implications for Individual Quantification in Wildlife Crimes

Shen, Y.; He, K.; Wang, W.; Huang, L.; Chen, J.

2026-04-06 zoology 10.64898/2026.04.02.715475 medRxiv
Top 0.2%
0.2%
Show abstract

In wildlife forensic practice, species identification and estimation of the Minimum Number of Individuals (MNI) for highly processed specimens have long relied on weight-based conversion methods, which may result in underestimation of the number of individuals involved in a case. Focusing on confiscated casque products of the helmeted hornbill (Rhinoplax vigil), this study combines macroscopic morphological examination with mitochondrial DNA barcoding (16S rRNA, COI, and Cytb) to explore a more robust approach for individual quantification. The results demonstrate that the conventional "weight-based" approach overlooks critical biological information contained in anatomical structures and cannot accurately reflect the actual number of individuals involved. Based on this, we propose an anatomy-based criterion centered on the principle of structural uniqueness: specimens retaining biologically unique beak or casque structures should be directly assigned to a single individual, whereas weight-based estimation should only be applied when original anatomical features are entirely absent. In addition, considering material loss during processing, we propose approximately 85 g as a reference threshold for estimating the number of individuals in heavily processed solid casque products. This approach improves the scientific rigor and accuracy of forensic identification and provides reliable technical support for the conviction, sentencing, and law enforcement of wildlife trafficking cases involving helmeted hornbill and other endangered species.

11
Accurate identification of invasive Aedes mosquito species using low-cost imaging and geometric wing morphometrics

Sauer, F. G.; Joest, H.; Sulesco, T.; Duve, P.; Loc, D. H.; Nolte, K.; Luehken, R.

2026-04-10 zoology 10.64898/2026.04.08.717229 medRxiv
Top 0.3%
0.2%
Show abstract

Accurate species identification is crucial to assess the medical and veterinary relevance of a mosquito specimen, but it requires high experience of the observers and well-equipped laboratories. This study aimed to evaluate whether low-cost imaging in combination with geometric wing morphometrics can provide accurate identification of invasive, morphologically similar Aedes species. The right wings of 670 female specimens covering 184 Ae. aegypti, 156 Ae. albopictus, 166 Ae. j. japonicus and 164 Ae. koreicus, were removed, mounted and photographed with a professional stereomicroscope (Olympus SZ61, Olympus, Tokyo, Japan) and a macro lens (Apexel-24XMH, Apexel, Shenzhen, China) attached to a smartphone. The coordinates of 18 landmarks on the vein crosses were digitalized by a single observer for each image. In addition, the landmarks of 20 specimens per species and imaging device were digitalized by six different observers to assess the degree of the observer error. The superimposed shape variables were used to compare the species classification accuracy of linear discriminant analysis (LDA), support vector machine (SVM), Random Forest (RF), and XGBoost. In the single-observer landmark data, the LDA achieved the best classification results with a mean accuracy of 95 % for landmarks from microscope images and 92 % for those obtained from smartphone images. In the multi-observer landmark data, LDA consistently performed worse than the other three classifiers, and the reduction in the accuracy was more pronounced for smartphone images than for microscope images. This pattern was associated with a higher degree of observer error for smartphone images, as confirmed by a landmark-wise comparison across all landmarks. Geometric wing morphometrics provides a reliable method to distinguish the most common invasive Aedes species in Europe. Thereby, the image quality obtained by smartphones equipped with a macro lens is sufficient and represents a cost-effective alternative to professional microscopes. However, due to the greater degree of observer variation for smartphone images, landmark coordinates for such images should ideally be collected by a single observer.

12
Re-evaluating the eukaryotic Tree of Life with independent phylogenomic data

Leroy, R. B.; Eme, L.; Lopez-Garcia, P.; Moreira, D.

2026-04-10 evolutionary biology 10.64898/2026.04.08.717153 medRxiv
Top 0.3%
0.2%
Show abstract

Understanding the phylogenetic relationships among eukaryotic lineages is essential for tracing the evolution of key phenotypic traits and inferring the nature of the Last Eukaryotic Common Ancestor. While phylogenomic analyses have clustered eukaryotic taxa into several well-supported major supergroups, the relationships among them remain largely uncertain. Phylogenetic signal erosion over deep time and limited available taxon sampling are among the possible causes. However, most previous studies rely on variations of the same core protein dataset, hence containing the same potential systematic biases. Here, we reconstructed the eukaryotic Tree of Life using a largely independent, marker-rich dataset derived from highly conserved Benchmarking Universal Single-Copy Orthologs. Unlike previous collections, our 277-marker supermatrix minimizes ribosomal protein representation and shares less than 25% overlap with previous datasets. State-of-the-art analyses of this dataset confirm most eukaryotic supergroups previously observed, but suggest different positions for some lineages. Notably, Telonemia clusters with Haptophyta rather than SAR (Stramenopiles-Alveolata-Rhizaria), and Ancyromonadida and Malawimonadida form a monophyletic group at the base of the Opimoda. Our results highlight the importance of analyzing independent phylogenomic datasets and support the hypothesis that extant eukaryotic diversity encompasses a small number of high-rank, supergroup lineages.

13
New insights into the postcranial morphology of Lithornis vulturinus from the Eocene London Clay

Widrig, K.; Field, D. J.

2026-03-19 paleontology 10.64898/2026.03.17.711321 medRxiv
Top 0.4%
0.1%
Show abstract

The deepest phylogenetic divergence within crown birds (Neornithes) is that between the reciprocally monophyletic Palaeognathae and Neognathae. Extant palaeognath diversity comprises the iconic flightless "ratites" (ostriches, rhea, kiwi, cassowaries, and emu), as well as 46 species of volant tinamous in Central and South America (Billerman et al., 2020). Although the earliest stages of palaeognath evolution remain shrouded in mystery due to a sparse fossil record, a group of apparently volant extinct palaeognaths from the Paleogene of Europe and North America, the lithornithids, can help to clarify palaeognath origins. Here, we use high resolution microCT scanning to characterize the morphology of two lithornithid specimens from the early Eocene (Ypresian) London Clay Formation: the neotype of Lithornis vulturinus (NHMUK A5204), from the Isle of Sheppey, Kent, England, and a newly discovered clay nodule containing lithornithid postcranial remains from the nearby locality of Seasalter. This three-dimensional dataset reveals bones from the L. vulturinus neotype that are partially or completely covered by matrix, allowing us to redescribe this critical specimen in new detail and present a revised differential diagnosis of L. vulturinus. We refer the new specimen from Seasalter to L. vulturinus on the basis of apomorphies such as a proximally directed lateral process of the coracoid, caudally divergent lateral margins of the sternum, an arcuate deltopectoral crest, as well as its provenance from a nearby penecontemporaneous locality. The Seasalter specimen contains abundant postcranial material that provides new insight into bones damaged or missing in the neotype, including two undamaged scapulae bearing the hooked acromion that is a diagnostic feature of lithornithids, two complete coracoids, and a nearly complete three-dimensionally preserved sternum. Its estimated body mass is one third larger than that of the neotype, indicating intraspecific variation within L. vulturinus that may reflect sexual dimorphism. Molecular divergence dates and Cretaceous neognath fossils indicate the presence of total-clade palaeognaths before the K-Pg mass extinction event; detailed anatomical descriptions of Paleogene palaeognaths will assist in the identification of the first total-clade palaeognaths from the Cretaceous, and provide insight into how and when flight was independently lost among Cenozoic crown palaeognaths.

14
On the Comparison of LGT networks and Tree-based Networks

Marchand, B.; Tahiri, N.; Tremblay-Savard, O.; Lafond, M.

2026-04-01 bioinformatics 10.1101/2025.11.20.689557 medRxiv
Top 0.4%
0.1%
Show abstract

Phylogenetic networks are widespread representations of evolutionary histories for taxa that undergo hybridization or Lateral-Gene Transfer (LGT) events. There are now many tools to reconstruct such networks, but no clearly established metric to compare them. Such metrics are needed, for example, to evaluate predictions against a simulated ground truth. Despite years of effort in developing metrics, known dissimilarity measures either do not distinguish all pairs of different networks, or are extremely difficult to compute. Since it appears challenging, if not impossible, to create the ideal metric for all classes of networks, it may be relevant to design them for specialized applications. In this article, we introduce a metric on LGT networks, which consist of trees with additional arcs that represent lateral gene transfer events. Our metric is based on edit operations, namely the addition/removal of transfer arcs, and the contraction/expansion of arcs of the base tree, allowing it to connect the space of all LGT networks. We show that it is linear-time computable if the order of transfers along a branch is unconstrained but NP-hard otherwise, in which case we provide a fixed-parameter tractable (FPT) algorithm in the level. We implemented our algorithms and demonstrate their applicability on three numerical experiments. Full online versionhttps://www.biorxiv.org/content/10.1101/2025.11.20.689557

15
Variable Resolution Maps (VRM) in CCTBX and Phenix: Accounting For Local Resolution In cryoEM

Afonine, P.; Adams, P. D.; Urzhumtsev, A. G.

2026-03-28 bioinformatics 10.64898/2026.03.25.714315 medRxiv
Top 0.4%
0.1%
Show abstract

Calculation of density maps from atomic models is essential for structural studies using crystallography and electron cryo-microscopy (cryoEM). These maps serve various purposes, including atomic model building, refinement, visualization, and validation. However, accurately comparing model-calculated maps to experimental data poses challenges, particularly because the resolution of cryoEM experimental maps varies across the map. Traditional crystallography methods generate finite-resolution maps with uniform resolution throughout the unit cell volume, while most modern software in cryoEM employ Gaussian-like functions to generate these maps, which does not adequately account for atomic model parameters and resolution. Recent work by Urzhumtsev & Lunin (2022, IUCr Journal, 9, 728-734) introduces a novel method for computing atomic model maps that incorporate local resolution and can be expressed as analytically differentiable functions of all atomic parameters. This approach enhances the accuracy of matching atomic models to experimental maps. In this paper, we detail the implementation of this method in CCTBX and Phenix. SynopsisNew tools implemented in CCTBX and Phenix allow the calculation of variable-resolution maps through a sum of atomic images expressed as analytic functions of all atomic parameters, along with their associated local resolution.

16
Identification, evolutionary history and characteristics of orphan genes in root-knot nematodes

Seckin, E.; Colinet, D.; Bailly-Bechet, M.; Seassau, A.; Bottini, S.; Sarti, E.; Danchin, E. G.

2026-04-11 bioinformatics 10.64898/2025.12.19.695360 medRxiv
Top 0.5%
0.1%
Show abstract

Orphan genes, lacking homologs in other species, are systematically found across genomes. Their presence may result from extensive divergence from pre-existing genes or from de novo gene birth, which occurs when a gene emerges from a previously non-genic region. In this study, we identified orphan genes in the genomes of globally distributed plant-parasitic nematodes of the genus Meloidogyne and investigated their origins, evolution, and characteristics. Using a comparative genomics framework across 85 nematode species, we found that 18% of Meloidogyne genes are genus-specific, transcriptionally supported orphans. By combining ancestral sequence reconstruction and synteny-based approaches, we inferred that 20% of these orphan genes originated through high divergence, while 18% likely emerged de novo. Proteomic and translatomic evidence confirmed the translation of a subset of these genes, and feature analyses revealed distinctive molecular signatures, including shorter length, signal peptide enrichment, and a tendency for extracellular localization. These findings highlight orphan genes as a substantial and previously underexplored component of the Meloidogyne genome, with potential roles in their worldwide parasitism.

17
Intertidal Gastropods (Gastropoda: Mollusca): Insights on diversity and distribution in the Mumbai Metropolitan Region, India

Ansari, R. M.; Patade, P.; Modi, S.

2026-03-30 zoology 10.1101/2025.11.20.689409 medRxiv
Top 0.5%
0.1%
Show abstract

Marine biodiversity documentation from the Mumbai Metropolitan Region (MMR) remains neglected despite the region having diversity of marine coastal habitats. The regions intertidal is one such habitat where species documentation remained heavily deficient due to lack of assessments and general apathy towards the habitat. This study addresses the issue of data deficiency of one of the largest taxa, Gastropoda through a decade long citizen science project, Marine Life of Mumbai. There exist large gaps in taxonomic research that have led to inconsistencies in species identification and inadequate ecosystem representation. This study addresses these issues by focusing on one of the largest taxa, the Molluscan class Gastropoda within the MMR. We present the spatial distribution of gastropod assemblages from 28 rocky, sandy and muddy intertidal sites within the Mumbai Metropolitan Region, on west coast of India. A total of 163 species were recorded from 2164 observations of marine gastropods. Among these, 29 species, 34 genera and one family Limapontiidae are new records for the region. Additionally, this study reports rediscoveries of 7 species from their type locality, with 5 species of Heterobranchs recorded after 78 years: one species from Neogastropoda, Lataxiena bombayana, after 131 years and one from Siphonariida, Siphonaria bassiensis after 31 years, from their type locality. These species are herein illustrated with detailed morphological descriptions and their local distribution on 28 sites in the Mumbai Metropolitan Region. Through this study we elucidate that the citizen science efforts and the subsequent taxonomic analysis provide an effective and low-cost method for filling data gaps from large, understudied geographical areas.

18
The Monophyly of Nycteria and Polychromophilus Parasites A Missing Piece in the Evolution of Malaria and Other Haemosporida

Pacheco, M. A.; Schaer, J.; Werb, O.; Mello, B.; Escalante, A. A.

2026-04-10 evolutionary biology 10.64898/2026.04.07.717123 medRxiv
Top 0.5%
0.1%
Show abstract

Haemosporida is a diverse order of vector-borne apicomplexan parasites infecting terrestrial vertebrates worldwide, including humans, but the evolutionary relationships among its genera remain unresolved. The phylogenetic placement of two bat-restricted genera, Nycteria and Polychromophilus, both of which lack erythrocytic schizogony, has varied across studies depending on taxon sampling and marker choice. To address this problem, an expanded dataset of near-complete mitochondrial (mtDNA) genomes together with nine nuclear loci were analyzed. Phylogenetic analyses of mtDNA recovered Nycteria and Polychromophilus as a strongly supported monophyletic clade. In contrast, analyses based only on the three mitochondrial coding genes (CDS) or a reduced nuclear dataset failed to recover their monophyly and showed low support and extensive topological conflict at deeper nodes. These results indicate that near-complete mitochondrial genomes recover phylogenetic signal that is not captured by reduced mitochondrial coding sequences or partial nuclear datasets. Molecular dating analyses further showed that divergence estimates for a putative Nycteria-Polychromophilus clade are compatible with the proposed times for bats diversification, and consistent with the broader haemosporidian timescale. When the Nycteria-Polychromophilus clade was incorporated as a calibration prior, divergence-time estimates became more precise without altering the overall evolutionary timeframe. Substantial mitochondrial gene-order rearrangements in a distinct Nycteria lineage were confirmed, highlighting structural divergence within this bat-associated group. In addition, heterogeneity in rates across mtDNA haemosporidian lineages was observed. Together, these findings support the existence of a distinct bat-associated clade whose deeper placement and evolutionary significance should be tested with broader phylogenomic sampling. Author SummaryMalaria parasites belong to a diverse group of organisms that infect many kinds of vertebrates, including birds, reptiles, and mammals (such as humans). Understanding how these parasites are related to each other is important for explaining how key biological traits have evolved. However, the relationships among major groups of haemosporidian parasites, including malaria parasites, remain unclear, particularly for those infecting bats. In this study, we focused on two groups of bat parasites, Nycteria and Polychromophilus, which share unusual biological features. The inferred evolutionary relationships of these two genera to other haemosporidians have been inconsistent across previous studies. By analyzing near-complete mitochondrial genomes, we found strong evidence that these two groups descended from a common evolutionary ancestor. In contrast, smaller datasets including nuclear genes failed to recover this relationship and produced conflicting results, suggesting that they lack sufficient information to resolve deep evolutionary relationships. We also found that this bat-associated lineage likely originated around the same time as early bats. In addition, we identified structural changes in the mitochondrial genome of one lineage, highlighting its evolutionary distinctiveness. Together, our results suggest that bats host a unique group of malaria parasites and demonstrate that more complete genetic data are essential for resolving their evolutionary history.

19
Gene Flow Creates Fuzzy Species Boundaries in Fence Lizards

Leache, A.; Davis, H.; Guerra, E.; Herrera, A.; Lemos-Espinal, J.; Fujita, M.; Myers, T. C.; Singhal, S.

2026-04-10 evolutionary biology 10.64898/2026.04.07.717035 medRxiv
Top 0.5%
0.1%
Show abstract

Species delimitation is a fundamental challenge in systematic biology, particularly for geographically variable taxa with hierarchical population structure and gene flow. Migration-aware coalescent models provide a powerful framework for investigating lineage divergence and accurately defining species boundaries. In this study, we combine statistical evaluations of gene flow with phylogenetic and population structure analyses to delimit species of fence lizards within the Sceloporus undulatus complex, a group characterized by extensive population subdivision, mitochondrial DNA introgression, and nuclear gene flow. We find that the undulatus complex exhibits uneven variation in genetic, morphological, and bioclimatic traits, resulting in variable distinctiveness among groups. In some cases, species boundaries are recognized by clear genetic discontinuities without gene flow. In others, shallow divergence, paraphyly, and gene flow produce leaky boundaries and fuzzy species limits. Mitochondrial introgression is extensive and concentrated at species boundaries, whereas nuclear gene flow occurs between only a few species and at much lower levels than within species. Neither within-species populations or species are substantially diverged across morphology or bioclimatic space, highlighting the limited utility of these traits for diagnosing species in this group. By integrating estimates of gene flow with phylogenetic and population structure analyses, this study provides a robust and biologically meaningful revised taxonomic framework for the undulatus complex that identifies independently evolving lineages as species.

20
Col-Ovo: Smartphone-based artificial intelligence for rapid counting of Aedes mosquito eggs under field conditions

Almanza, J.; Montenegro, D.

2026-03-24 bioinformatics 10.64898/2026.03.19.712860 medRxiv
Top 0.6%
0.0%
Show abstract

BackgroundOviCol has recently been proposed as a disruptive strategy for the surveillance and control of synanthropic Aedes mosquitoes, vectors of dengue, Zika, and chikungunya viruses. The approach integrates monitoring and control through ultra-low-cost ovitraps ([~]0.2 USD), bioattractants, and egg inactivation using hot water. However, large-scale ovitrap surveillance generates thousands of egg substrates that require time-consuming manual counting, creating a major operational bottleneck. To address this limitation, we developed Col-Ovo, an artificial intelligence-based tool for automated counting of Aedes aegypti eggs from real field samples, together with OviLab, a digital platform for annotation, curation, and management of entomological image datasets. Methodology/Principal FindingsThe detection model was trained using YOLOv11m on a dataset of 275 oviposition substrates (20.5 cm strips) collected under routine operational conditions. Images were captured in situ without preprocessing and included substrates heavily stained by bioattractants such as blackstrap molasses and dry yeast (Saccharomyces cerevisiae), as well as sand and particulate debris, reflecting realistic field conditions. The system was designed to operate with standard smartphone images and tolerate compression artifacts produced by messaging platforms such as WhatsApp. Performance was evaluated by comparing automated egg counts with expert manual counts and with virtual-human counts conducted in OviLab using >200% image magnification. Col-Ovo achieved >95% agreement with expert counts and 88% agreement with OviLab while reducing processing time from approximately 15 minutes to <3 seconds per sample. Conclusions/SignificanceCol-Ovo enables rapid, scalable quantification of Ae. aegypti eggs from smartphone images, addressing a critical operational barrier in ovitrap-based surveillance. The system requires no image preprocessing or specialized hardware and is accessible through a lightweight web interface supported by an AI architecture that allows retraining for new ecological contexts or additional Aedes species. Integrated with OviLab, this platform provides a flexible digital infrastructure that can strengthen routine vector surveillance and community-level control programs across regions where Aedes mosquitoes continue to expand. Author SummaryMosquitoes that transmit dengue, Zika, and chikungunya are expanding in many parts of the world. Monitoring their populations is essential for guiding prevention and control actions. A common surveillance method uses small traps where female mosquitoes lay their eggs. By counting the eggs collected in these traps, health programs can estimate mosquito abundance and detect increases in risk. However, the eggs are extremely small: about 0.065 mm{superscript 2}, and are usually counted manually under magnification. This process is slow, requires trained personnel, and limits how many samples can be analyzed in routine surveillance. In this study, we developed a digital tool that automatically counts mosquito eggs from photographs taken with a smartphone. The system was trained using images collected under real field conditions, including samples with stains, dirt, and other materials commonly found in mosquito traps. The tool can analyze images even when they are compressed and shared through WhatsApp. By reducing counting time from 15 minutes to only a 25 seconds, this approach can help strengthen mosquito surveillance and support faster responses to mosquito-borne disease risks.